Methodologies for Spatially Interpolating Climate Adaptation Options from Econometrics Methods
Author
Maxwell Mkondiwa
1 Introduction
This notebook provides machine learning and spatial methods for assessing the impacts of climate on yields and identifying adaptation options in a spatially explicit manner. The goal is to generate spatially differentiated (gridded) estimates of the effects of weather extremes and adaptation options.
We use publicly available datasets in India to demonstrate these models.
2 Important geospatial R packages: Terra, geodata,sf, sp
# Calibration check: Multi-arm causal RF does not yet calibration check## We use binary causal RF to do thatW_cf_sowing_binary=as.vector(LDSestim_sow$Sowing_Date_Early) # Probability random forest to create weightsW.multi_sowing.forest_binary <-regression_forest(X_cf_sowing, W_cf_sowing_binary,equalize.cluster.weights =FALSE,seed =2)W.hat.multi.all_sowing_binary <-predict(W.multi_sowing.forest_binary, estimate.variance =TRUE)$predictions# Regression forest to get expected responses Y.multi_sowing.forest_binary <-regression_forest(X_cf_sowing, Y_cf_sowing,equalize.cluster.weights =FALSE,seed =2)print(Y.multi_sowing.forest_binary)
GRF forest object of type regression_forest
Number of trees: 2000
Number of training samples: 7562
Variable importance:
1 2 3 4 5 6 7 8 9 10 11 12 13
0.000 0.000 0.001 0.022 0.012 0.647 0.195 0.000 0.066 0.001 0.001 0.005 0.001
14 15 16 17 18 19 20 21
0.001 0.004 0.002 0.024 0.015 0.001 0.001 0.002
varimp.multi_sowing_binary <-variable_importance(Y.multi_sowing.forest_binary)Y.hat.multi.all_sowing_binary <-predict(Y.multi_sowing.forest_binary, estimate.variance =TRUE)$predictions# Fit binary causal RF modelmulti_sowing.forest_binary <-causal_forest(X = X_cf_sowing, Y = Y_cf_sowing, W = W_cf_sowing_binary ,W.hat=W.hat.multi.all_sowing_binary,Y.hat=Y.hat.multi.all_sowing_binary,seed=2) varimp.multi_sowing_cf_binary <-variable_importance(multi_sowing.forest_binary)# Average treatment effectsmulti_sowing_ate_binary=average_treatment_effect(multi_sowing.forest_binary,target.sample ="overlap")multi_sowing_ate_binary
Best linear fit using forest predictions (on held-out data)
as well as the mean forest prediction as regressors, along
with one-sided heteroskedasticity-robust (HC3) SEs:
Estimate Std. Error t value Pr(>t)
mean.forest.prediction 1.069524 0.084024 12.7288 < 2.2e-16 ***
differential.forest.prediction 1.728729 0.280558 6.1617 3.782e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.01807 0.18037 0.23215 0.22897 0.27845 0.42748
3.1.2 Understanding Mechanisms
We can use the model to understand how farmers who used different amounts of fertilizer or inputs benefit from early sowing of wheat. In addition, we can investigate if farmers who are above the heat stress threshold gain from early sowing.
X_cf_sowingtau=data.frame(X_cf_sowing,tau.hat_sowing)library(ggplot2)ggplot(X_cf_sowingtau,aes(x = predictions)) +geom_histogram() +xlab('CATE') +geom_vline(xintercept =0, col ='black', linetype ='dashed') +geom_vline(xintercept = multi_sowing_ate_binary["estimate"], col ='red') +theme_bw()
# Based on classifications of treatment gainslibrary(gtools)X_cf_sowingtau$sowing_treat_bin <-quantcut(X_cf_sowingtau$predictions)table(X_cf_sowingtau$sowing_treat_bin)
The above analysis however shows only the benefits to those farmers who experienced that level of stress. What about those who didn’t experience it but could have experienced it? For a counterfactual analysis of how each farmer would have gained or lost if they had a terminal stress and had sown early assuming the same levels of use other inputs, we predict the treatment effect under the assumption that all variables are the same except for the maximum temperature which is fixed at a value above 31 (e.g., 32).
The first strategy is to translate all variables to the grid. This involves interpolation across space and using new variable names. In this case, instead of gender being a dummy, you use a proportion of female or male after interpolation.
4.4.2 Spatial Bayesian Geostatistical Gaussian Process Model [Aka Bayesian Kriging]
If one is interested in calculating other measures other than the predicted value (for example, the probability of exceeding some amount), then a Bayesian gaussian process model is the best alternative in that using Markov Chain Monte Carlo simulations one can use a probabilistic assessment.
### Bayesian kriging LDS_small <- LDS[sample(1:nrow(LDS),800),] coords=dplyr::select(LDS_small,O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude)coords=as.matrix(coords)# The public version of the data has duplicated coordinates# We need to jitter these because spatial Bayesian kriging requires unique coordinates.library(geoR)coords=jitterDupCoords(coords,min=2,max=10)coords=as.matrix(coords)# Bayesian models take much time to render. We sample 1000 observations to showcase the approachlibrary(spBayes)n.samples=1000t1 <-Sys.time()r <-1n.ltr <- r*(r+1)/2priors <-list("phi.Unif"=list(rep(1,r), rep(10,r)), "K.IW"=list(r, diag(rep(1,r))), "tau.sq.IG"=c(2, 1))starting <-list("phi"=rep(3/0.5,r), "A"=rep(1,n.ltr), "tau.sq"=1)tuning <-list("phi"=rep(0.1,r), "A"=rep(0.01, n.ltr), "tau.sq"=0.01)cf.sowing.sp <- spBayes::spSVC(yield_kgperha~1, data=LDS_small,coords=coords,starting= starting,tuning=tuning,priors=priors,cov.model="exponential",n.samples=n.samples,n.omp.threads=15,svc.cols=c("(Intercept)"))
----------------------------------------
General model description
----------------------------------------
Model fit with 800 observations.
Number of covariates 1.
Number of space varying covariates 1.
Using the exponential spatial correlation model.
Number of MCMC samples 1000.
Priors and hyperpriors:
beta flat.
K IW hyperpriors:
df: 1.00000
S:
1.000
phi Unif lower bound hyperpriors: 1.000
phi Unif upper bound hyperpriors: 10.000
tau.sq IG hyperpriors shape=2.00000 and scale=1.00000
Source compiled with OpenMP, posterior sampling is using 15 thread(s).
-------------------------------------------------
Sampling
-------------------------------------------------
Sampled: 100 of 1000, 10.00%
Report interval Metrop. Acceptance rate: 48.00%
Overall Metrop. Acceptance rate: 48.00%
-------------------------------------------------
Sampled: 200 of 1000, 20.00%
Report interval Metrop. Acceptance rate: 44.00%
Overall Metrop. Acceptance rate: 46.00%
-------------------------------------------------
Sampled: 300 of 1000, 30.00%
Report interval Metrop. Acceptance rate: 29.00%
Overall Metrop. Acceptance rate: 40.33%
-------------------------------------------------
Sampled: 400 of 1000, 40.00%
Report interval Metrop. Acceptance rate: 31.00%
Overall Metrop. Acceptance rate: 38.00%
-------------------------------------------------
Sampled: 500 of 1000, 50.00%
Report interval Metrop. Acceptance rate: 25.00%
Overall Metrop. Acceptance rate: 35.40%
-------------------------------------------------
Sampled: 600 of 1000, 60.00%
Report interval Metrop. Acceptance rate: 30.00%
Overall Metrop. Acceptance rate: 34.50%
-------------------------------------------------
Sampled: 700 of 1000, 70.00%
Report interval Metrop. Acceptance rate: 28.00%
Overall Metrop. Acceptance rate: 33.57%
-------------------------------------------------
Sampled: 800 of 1000, 80.00%
Report interval Metrop. Acceptance rate: 24.00%
Overall Metrop. Acceptance rate: 32.38%
-------------------------------------------------
Sampled: 900 of 1000, 90.00%
Report interval Metrop. Acceptance rate: 32.00%
Overall Metrop. Acceptance rate: 32.33%
-------------------------------------------------
Sampled: 1000 of 1000, 100.00%
Report interval Metrop. Acceptance rate: 29.00%
Overall Metrop. Acceptance rate: 32.00%
-------------------------------------------------
Source compiled with OpenMP, posterior sampling is using 1 thread(s).
-------------------------------------------------
Recovering beta and w
-------------------------------------------------
Sampled: 99 of 251, 39.44%
Sampled: 199 of 251, 79.28%
# Kriginglibrary(terra)library(stars)library(raster)library(gstat) # Use gstat's idw routinelibrary(sp) # Used for the spsample functionlibrary(tmap)library(geodata)# India=gadm(country="IND", level=1, path=tempdir())# plot(India)# India_State_Boundary=subset(India,India$NAME_1=="Bihar")# plot(India_State_Boundary)# library(sf)#India_State_Boundary=st_as_sf(India_State_Boundary)#India_State_Boundary_Bihar_sp=as_Spatial(India_State_Boundary_Bihar)library(spBayes)India_aoi_sf_dis_sp=as_Spatial(India_aoi_sf_dis)LDS_small_sp=SpatialPointsDataFrame(cbind(LDS_small$O.largestPlotGPS.Longitude,LDS_small$O.largestPlotGPS.Latitude),data=LDS_small,proj4string=CRS("+proj=longlat +datum=WGS84"))LDS_small_sp@bbox <- India_aoi_sf_dis_sp@bboxgrd <-as.data.frame(spsample(LDS_small_sp, "regular", n=10000))names(grd) <-c("X", "Y")coordinates(grd) <-c("X", "Y")gridded(grd) <-TRUE# Create SpatialPixel objectfullgrid(grd) <-TRUE# Create SpatialGrid objectplot(grd)
Source compiled with OpenMP, posterior sampling is using 15 thread(s).
-------------------------------------------------
Point-wise sampling of predicted w
-------------------------------------------------
Sampled: 99 of 251, 39.44%
Sampled: 199 of 251, 79.28%
# Standard deviationlibrary(rasterVis)pred.sd=pred.grid["pred.sd"]pred.sd=raster(pred.sd)pred.sd=mask(pred.sd,India_aoi_sf_dis_sp)pred.sd_plot=levelplot(pred.sd,par.settings=RdBuTheme(),contour=TRUE)pred.sd_plot
# Probability of 3 tonspred.prob_3tons=pred.grid["pred.prob_3tons"]pred.prob_3tons=raster(pred.prob_3tons)pred.prob_3tons=mask(pred.prob_3tons,India_aoi_sf_dis_sp)pred.prob_3tons_plot=levelplot(pred.prob_3tons,par.settings=RdBuTheme(),contour=TRUE)pred.prob_3tons_plot
# Probability of 5 tonspred.prob_5tons=pred.grid["pred.prob_5tons"]pred.prob_5tons=raster(pred.prob_5tons)pred.prob_5tons=mask(pred.prob_5tons,India_aoi_sf_dis_sp)pred.prob_5tons_plot=levelplot(pred.prob_5tons,par.settings=RdBuTheme(),contour=TRUE)pred.prob_5tons_plot
4.4.3 Spatial Bayesian Geoadditive Model
library(bamlss)set.seed(111)f <- yield_kgperha ~s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude)## estimate model.b <-bamlss(f, data = LDS)
## Predict for each latitude and longitudepred <-expand.grid(O.largestPlotGPS.Longitude =seq(82, 89, length =100),O.largestPlotGPS.Latitude =seq(24,28, length =100))yield_hat <-predict(b,newdata=pred)yield_hat=as.data.frame(yield_hat)pred_yield_hat=cbind(pred,yield_hat)#pred_yield_hat$sigma=NULLlibrary(terra)myras <-rast(pred_yield_hat, type="xyz")plot(myras)
## Predict for each latitude and longitudepred <-expand.grid(O.largestPlotGPS.Longitude =seq(82, 89, length =100),O.largestPlotGPS.Latitude =seq(24,28, length =100))yield_hat <-predict(b,newdata=pred)yield_hat=as.data.frame(yield_hat)pred_yield_hat=cbind(pred,yield_hat)#pred_yield_hat$sigma=NULLlibrary(terra)myras <-rast(pred_yield_hat, type="xyz")plot(myras)
There two important cases in which one may have areal data to use in climate adaptation prioritization. The first case is in which there is plot or farm level plot but that the statistical agency did not provide individual level coordinates. In these cases, the only of relating the individual observations to spatial data is through a higher level spatial level. Secondly, there are cases especially with administrative data in which the data are available only at the aggregated spatial level.
5.1 Using centroid
One can get the centroid of each polygon, e.g., then fit a geostatistical model as in the spatial Bayesian Geostatistical Gaussian Process Model or geoadditive model as above.
5.1.1 Markov Random Field (MRF) Geoaddive Structured and Unstructured Spatial Model
In the case where there are data at individual farm level albeit not have geocoordinates,one can use structured geoadditive model to ascertain the patterns of the explained and unexplained spatial effect.